Two-ways Adaptive Failure Detection with the φ-Failure Detector
نویسندگان
چکیده
It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. Such a service can however prove useful only if it can adapt simultaneously to changing network conditions and conflicting application requirements. This paper presents a novel approach to adaptive failure detectors, called φ-failure detectors, which dynamically adapts to application requirements, as well as network conditions. The key idea is as follows. Traditionally, failure detectors maintain a set of suspected processes. The information is hence of boolean nature, that is, some process p is suspected if and only if it belongs to this set. In contrast, a φ-failure detector associates a value φp to every known process p. The value φp increases according to a normalized scale which represents the degree of confidence that process p has crashed. The scale is dynamically adapted from the current network conditions, and each application can trigger suspicions according to a threshold which corresponds to its own requirements. We describe a possible implementation for such a service, although some specific questions remain open where this work is still in progress.
منابع مشابه
Implementation and Performance Analysis of the φ-Failure Detector
Failure detection is a fundamental building block for ensuring fault tolerance in distributed systems. However, providing accurate and flexible failure detection in off-the-shelf distributed systems is difficult. Practical solutions to failure detection rely on some adaptive mechanism to cope with the unpredictability of networking conditions. However, while they provide reasonably good accurac...
متن کاملThe Φ Accrual Failure Detector
Detecting failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far. One of the reasons is the difficulty to satisfy several application requirements simultaneo...
متن کاملSelf-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches
Composition, change and complexity have attracted ev- eryone’s attention towards Self-Adaptive systems. These systems, inspired by the human body, are capable of adapting to changes in the inner and outer environment. The main objective of this study is to achieve a more con- venient availability for e-banking services in the payment switch, using self-healing systems and focusing on the failur...
متن کاملSelf-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches
Composition, change and complexity have attracted ev- eryone’s attention towards Self-Adaptive systems. These systems, inspired by the human body, are capable of adapting to changes in the inner and outer environment. The main objective of this study is to achieve a more con- venient availability for e-banking services in the payment switch, using self-healing systems and focusing on the failur...
متن کاملOn the Design of a Failure Detection Service for Large-Scale Distributed Systems
It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the main issues related to ensuring failure detection in large-scale systems, and overview the main solutions proposed in the lit...
متن کامل